Reinforcing Parser Preferences through Tagging
نویسندگان
چکیده
Lexical ambiguity is an important source of inefficiency for wide-coverage HPSG parsing. In this paper, we propose a lexical analysis filter which removes unlikely lexical categories. The filter is implemented as a straightforward HMM n-gram POS-tagger, which computes the ’a posteriori’ probability of each lexical category. A lexical category is removed if a competing lexical category is sufficiently more likely. The novel aspect of our approach is the fact that the tagger is trained on the output of the parser itself; therefore there is no need for hand-annotated material. Use of this filter increases the speed of the parser considerably, and in addition gives rise to an improvement in parsing accuracy. RÉSUMÉ. L’ambiguı̈té lexicale est une source importante de l’inefficacité de l’analyse syntaxique HPSG à large couverture. Dans cette contribution, nous proposons un filtre analyseur lexical qui élimine des catégories lexicales improbables. Le filtre est implémenté comme un étiqueteur markovien (HMM) n-gramme standard, qui calcule la probabilité ’a posteriori’ de chaque catégorie lexicale. Une catégorie lexicale est rejetée, quand celui-ci est en concurrence avec des catégories lexicales qui sont suffisamment plus probables. La nouveauté de l’approche exposée ici consiste à entraı̂ner l’étiqueteur sur la sortie de l’analyseur lui-même ; par conséquence, on n’a pas besoin de corpus étiquetés manuellement. L’emploi de ce filtre augmente considérablement la rapidité de l’analyseur, et en plus, en améliore la précision.
منابع مشابه
SynCoP – Combining Syntactic Tagging with Chunking Using Weighted Finite State Transducers
This paper describes the key aspects of the system SynCoP (Syntactic Constraint Parser) developed at the Berlin-Brandenburgische Akademie der Wissenschaften. The parser allows to combine syntactic tagging and chunking by means of constraint grammar using weighted finite state transducers (WFST). Chunks are interpreted as local dependency structures within syntactic tagging. The linguistic theor...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملA Statistics-Based Chinese Parser
This paper describes a statistics-based Chinese parser, which parses the Chinese sentences with correct segmentation and POS tagging information through the following processing stages: 1) to predict constituent boundaries, 2) to match open and close brackets and produce syntactic trees, 3) to disambiguate and choose the best parse tree. Evaluating the parser against a smaller Chinese treebank ...
متن کاملHow to Decrease the Performance of a Statistical Parser
This paper is a study of outside factors that may have a negative impact on a parser’s accuracy. The discussed phenomena can be divided into two classes, those related to treebank design, and those related to morphological tagging. Although the scope of the paper is limited to the Prague Dependency Treebank, two particular taggers and one particular parser, we believe that our observations may ...
متن کاملPart-of-speech tagging and chunk parsing of spoken Dutch using support vector machines
This paper describes the design and evaluation of a part-ofspeech tagger and chunk parser for spoken Dutch, using support vector machines. The data in the Corpus Gesproken Nederlands is split into smaller sub problems to obtain reasonable training and tagging speed using various kernel types. The tagger combines good accuracy with reasonable tagging speed. The chunk parser shows good accuracy, ...
متن کامل